Changes in the microbial community of semen exposed to different simulated forensic situations

ABSTRACT Semen is one of the common body fluids in sexual crime cases. The current methods of semen identification have certain limitations, so it is necessary to search for other methods. In addition, there are few reports of microbiome changes in body fluids under simulated crime scenes. It is essential to further reveal the changes in semen microbiomes after exposure to various simulated crime scenes. Semen samples from eight volunteers were exposed in closed plastic bags, soil, indoor, cotton, polyester, and wool fabrics. A total of 68 samples (before and after exposure) were collected, detected by 16S rDNA sequencing, and analyzed for the microbiome signature. Finally, a random forest model was constructed for body fluid identification. After exposure, the relative abundance of Pseudomonas and Rhodococcus changed dramatically in almost all groups. In addition, the treatment with the closed plastic bags or soil groups had a greater impact on the semen microbiome. According to the Shannon indices, the alpha diversity of the closed plastic bags and soil groups was much lower than that of the other groups. Attention should be given to the above two scenes in practical work of forensic medicine. In this study, the accuracy of semen recognition was 100%. The exposed semen can still be correctly identified as semen based on its microbiota characteristics. In summary, semen microbiomes exposed to simulated crime scenes still have good application potential for body fluid identification. IMPORTANCE In this study, the microbiome changes of semen exposed to different environments were observed, and the exposed semen microbiome still has a good application potential in body fluid identification.

In addition, microorganisms show certain application potential in the traceability of tissue and body fluids in forensic medicine (6).At present, the samples of body fluid identification research based on the characteristics of the human microbial commun ity composition are based mainly on fresh body fluid (7), and few studies have been conducted on tissue or body fluid exposure.In practice, body fluids are usually exposed to the environment for a period of time before extraction.Dobay et al. (8) and our previous studies (9) showed that the microbial characteristics of semen after exposure were relatively stable.The design of the above experiment was relatively simple, as described below, and swabs were only placed on the test tube rack and exposed to the indoor environment for a certain period of time.Semen is usually left in bed sheets, clothes, or plastic bags and is exposed to the environment of the crime scene.Therefore, in order to be more similar to the actual case, it is necessary to further improve the experimental design on the basis of previous experiments, simulate a variety of exposure environments that may occur in actual cases, and explore the impact of additional exposure factors.
In this study, semen samples from eight healthy individuals were collected and exposed for 15 days in plastic bags, indoor, soil, cotton carrier, wool carrier, and polyester fiber carrier environments.Using 16S rRNA gene sequencing, we aimed to explore the changes in semen microorganisms after exposure to different environments.

Sample collection
This study was approved by the Biomedical Ethics Committee of Southern Medical University, Guangzhou, China.After obtaining informed consent, we collected semen samples from eight individuals.All volunteers had no obvious urogenital complications or symptoms of sexually transmitted diseases and had no antibiotics for a month.Strictly disinfect the hands and glans with alcohol before sampling.After 3 days of abstinence (masturbation and sex were all prohibited), semen was collected by masturbation.The semen collected from the same volunteer was divided into seven portions, each containing 0.5 mL of semen, and divided into groups, such as (10) positive control group: one part of the semen was placed in sterile test tube and immediately stored at −80℃ (5); treatment group: semen was dropped on sterilized carriers of different materials (cotton, polyester, and wool fabric), or semen was dropped on sterile cotton swabs and placed on soil, closed plastic bags (CPB), and indoor environment.Then they were exposed for 15 days (Fig. 1).In addition, the negative control involved sterilized carriers and sterile cotton swabs exposed to the same environmental conditions as the treatment groups, whereas the blank control consisted of sterilized carriers and swabs that were not subjected to any exposure experiments.The ambient temperature to which the sample was exposed ranged from 20℃ to 25℃.

DNA extraction, PCR amplification, and sequencing
According to the manufacturer's instructions, the E.Z.N.A. Soil Omega Kit (Omega Bio-Tek, Norcross, Georgia, USA) was used to extract total DNA.The quality of DNA extraction was checked by 1% agarose gel electrophoresis and stored at −20°C.A Qubit dsDNA HS Assay Kit was used to detect the DNA concentration on a Qubit 2.0 fluorometer.For all qualified DNA samples, primers 338F (5′-GGACTACHVGGGTWTCTA AT-3′) and 806R (5′-GGAC TACHVGGGTWTCTAAT-3′) were used to amplify the V3-V4 region of the 16S rRNA gene.Each PCR (20 µL) contained 4 µL of 5× FastPfu buffer, 2 µL of 2.5 mM dNTPs, 0.8 µL of each primer (5 µM), 0.4 µL of FastPfu polymerase, and 10 ng of template DNA.The PCR cycle conditions were an initial step of 95°C for 3 minutes, followed by 27 cycles of 95°C (30 s), 55°C (3 s), and 72°C (30 s).After the cycle was completed, the reaction was incubated at 72°C for 10 minutes.A thermal cycling PCR system (GeneAmp 9700, ABI, USA) was used for the amplification.After PCR was completed, the amplicon concentration was normalized and combined using a SequalPrep DNA Normalization Plate (Invitrogen, Maryland, USA).The NEXTFLEX Rapid DNA-Seq Kit was used to construct a library of purified PCR products.After library quality control and quantification, sequencing was performed using MiSeq kit V3 (Illumina, San Diego, California, USA) on the Illumina MiSeq PE300 platform.The raw data were uploaded to the NCBI SRA database (accession number: PRJNA1102678).

Bioinformatics analysis
FLASH software (version 1.2.11) was used for DNA sequence assembly; QIIME (version 1.9.1) was used to generate each taxonomic abundance table and then calculate the beta diversity distance.Through the software platform, UPARSE was made (version 7.1) (http:// drive5.com/uparse/) to perform OTU clustering.With a similarity above 97%, an OTU table was generated.The RDP classifier (version 2.11) (https://sourceforge.net/projects/ rdp-classifier/) and the SILVA database (version 138) were used for classification.Mothur software (version 1.30.2) was used to calculate the alpha diversity and evenness index of the microbial community.R software (version 3.3.1)was used to generate various diagrams, such as rarefaction curves, Venn, arplot, and PCoA.The stats package in R language (version 3.3.1)and the scipy package in Python were used for the Wilcoxon rank sum test.It can determine the significant differences of species with different sample groups of semen and correct the P value by various methods.LEfSe(R language version 3.3.1)was used to perform linear discriminant analysis (LDA) aiming to analyze the significant differences among the different treatments of semen groups.b

Machine learning process
R software (version 3.3.1)was used to construct a random forest (RF) model based on the species and the relative abundance of microorganisms at the genus level.Fifty-six samples with semen were included in this study, including a positive group and treatment group.Other samples, such as 27 skin samples, 30 saliva samples, and 66 vaginal secretions, were obtained from our previously published articles (11,12).The model included a total of 179 samples, of which 70% (126) were randomly selected as the training set, and 30% (53) were selected as the testing set.Five hundred decision trees were set to construct the model.The mean decrease accuracy was used to evaluate the importance of the genus as an indicator.The higher the value, the more important the genus was in the classification model.The validation method used in this model was 10-fold cross-validation.The number of important features required for this model was determined when the prediction error rate was lowest.

Summary of semen samples
Semen samples were collected from eight unrelated male individuals, and the groups were as follows: positive control (8 samples), 6 treatment groups (48 samples), nega tive control (6 samples), and blank control groups (6 samples).PCR amplification and high-throughput sequencing were performed on the V3-V4 regions of the 16S rRNA genes of the above 68 samples.The blank control did not show any amplification products using gel electrophoresis, and further sequencing of the PCR products also did not yield results.PCR amplification and sequencing success were 100% for positive control, treatment group, and negative control group.Therefore, the total number of valid sequences obtained from 62 samples is 2,507,723.By clustering the sequences at the 97% similarity level, 5,569 original data OTUs were classified into 1 domain, 1 kingdom, 45 phyla, 131 classes, 315 orders, 547 families, 1,247 genera, and 2,232 species (Table S1).The Good's coverage for the observed OTUs was 0.98 ± 0.02 (mean ± sd), and the rarefaction curves showed a tendency toward saturation with increasing number of reads, indicating that the amount of sequencing data was large enough to reflect the vast majority of microbial species information in the samples (Fig. S1).

Alpha diversity analysis
According to the Chao index of each group, the index values of the closed plastic bag group, soil group, and negative control group were relatively low and were significantly different from the Chao indices of the other groups (P < 0.05) (Fig. 2A).According to the Shannon index, the index value of the positive control group was the highest, which was significantly different from that of the other groups (except for the indoor group) FIG 2 (A) Chao index and (B) Shannon index diagram of each group."Cpb, " "soil, " "indoor, " "co, " "po, " "wo" represented the treatment group that exposed semen to the closed plastic bags, soil, indoor environment, the carriers of different materials (cotton, polyester, and wool fabric), respectively.(0.01< P ≤ 0.05 is marked as *, 0.001<P ≤ 0.01 is marked as **, P ≤ 0.001 is marked as ***) (P < 0.05), and the index value of the CPB group was the lowest (P < 0.05) (Fig. 2B).Similarly, the Shannon indices of the indoor exposure group, cotton group, polyester group, and wood fabric group were also similar (Table S2).That is to say, through Chao index, Shannon index, and their differences among groups, it can be revealed that the community richness and diversity of CPB group are the lowest, followed by the soil group.The highest diversity of the community was in the positive control group.There were no significant differences in community diversity among the indoor exposure group, cotton group, polyester group, and wool fabrics group.

Compositional analysis of semen samples and significant difference analysis
The community barplot analysis at the phylum level showed that the microorganisms were mainly Proteobacteria, Actinobacteriota, and Firmicutes (Fig. S2A).Except for those of the CPB group and soil group, the microbial community compositions at the phylum level before and after semen exposure were similar, especially for those of the indoor group, cotton group, polyester group, and wood fabrics group (Fig. 3A).The Venn diagram analysis at the phylum level shows that there revealed 12 (66.67%)common phyla in the indoor group, cotton group, polyester group, and wood fabrics group (Fig. 3B).The community barplot analysis at the genus level showed that the main micro bial genera were Pseudomonas, Rhodococcus, Staphylococcus, Shigella, and Enterococcus (Fig. S2B).At the genus level, the relative abundance of Pseudomonas in the positive control group was low, but after the semen samples were exposed, the abundance of Pseudomonas increased significantly except in the CPB group, especially in the soil group.After the semen samples were exposed, the abundance of Rhodococcus changed significantly in CPB group and soil group, and its relative abundance decreased sharply to almost non-existence.In addition, the relative abundances of Ralstonia, Sphingomo nas, and Prevotella decreased after exposure, especially in the CPB group and soil group.Compared with the positive control group, the relative abundance of Staphylococcus in the CPB group increased significantly to 49.61%, and the relative abundance of Bacillus in the soil group increased significantly to 13.14% (Fig. 3C).The Venn diagram analysis at the genus level shows that there revealed 72 (27.69%) common genera in the indoor group, cotton group, polyester group, and wood fabrics group (Fig. 3D).
Based on the community abundance data in the sample, the Kruskal-Wallis H test was used to detect the species with different abundances in different microbial communities, carry out hypothesis tests, and evaluate the significance of the observed differences.The results of the Kruskal-Wallis H test barplot showed that the relative abundances of Pseudomonas, Rhodococus, Staphylococus, Ralstonia, Sphingomonas, Prevotella, and Bacillus had significant differences among the different groups (Fig. 4A).
Using the Lefse method, the microbial species with significant differences in the groups were screened.When the LDA threshold was set to 4, the indoor exposure group and the groups exposed to the three different carriers did not show any significantly enriched species.The species significantly enriched in the positive control group were Ralstonia, Sphingomonas, and Prevotella; Rhodococcus was significantly enriched in the negative control group; Staphylococcus was significantly enriched in the CPB group; and Pseudomonas and Bacillus were significantly enriched in the soil exposure group (Fig. 4B  and C).

Beta diversity analysis
Beta diversity analysis was used to perform comparative analysis among groups to explore the similarity or difference of community composition among samples in different groups.
Based on the Bray-Curtis distance algorithm, cluster analysis was carried out for all the samples.According to the results of the sample hierarchical clustering tree at the OTU level, it could be seen that the samples in the soil group could be clustered, some samples of the CPB group and the positive control group were clustered separately, and samples exposed to three different materials and some samples of the indoor group were clustered (Fig. 5A).
In non-metric multidimensional scaling (NMDS) analysis, points with different colors or shapes represent samples in different groups.The closer the two sample points are, the more similar the species composition of the two samples is.According to the results of NMDS at the OTU level, the samples of different groups could be roughly distin guished, but the separation of each group was not pronounced (Fig. 5B).Particularly, the semen samples subjected to exposure from three different materials exhibited substan tial overlap and proved challenging to differentiate.(Fig.5C).

Differences among the microbial communities of the body fluids
Comparing the microbiota of semen samples (both before and after exposure) with that of skin, saliva, and vaginal fluids from our previously published articles (11,12), we found distinct variations in the microbial compositions of each bodily fluid (Fig. 6A).Specifically, the predominant bacterial genera were identified as Pseudomonas in semen, Cutibacte rium in skin, Streptococcus in saliva, and Lactobacillus in vaginal fluids (Fig. 6B).The samples within each of these body fluids clustered cohesively, and there was evident discrimination between the different groups (Fig. 6C).

Using random forest model to identify the body fluids
Common samples in sexual assault cases include semen, skin and saliva, and vaginal secretions.Therefore, 27 skin samples, 30 saliva samples, and 66 vaginal fluid samples were selected for analysis together with 56 semen samples in this study.According to the results of 10-fold cross-validation, when the top 100 important features were selected at least, the prediction error rate of this model was the lowest, which was 0.02 (Fig. 7A).The prominent features that significantly contributed to the model's performance included genera such as Pseudomonas, Lactobacillus, Cutibacterium, Corynebacterium, Staphylococ cus, among others (Fig. 7B).The random forest model selected the top 100 important features and obtained the distribution probability value of the validation sample.All 53 samples in the validation set could be correctly classified, and the correctness rate reached 100% (Table 1).

DISCUSSION
Several studies have shown that semen microbes can be used to identify body fluids but few studies on semen exposure.Moreover, the existing semen exposure research is relatively simple, which is not enough to meet the needs of forensic practice.This The dominant phyla in the positive control group were Proteobacteria (40%), Actinobacteriota (27%), Firmicutes (20%), and Bacteroidetes (8%), which were similar to the findings of the semen samples of healthy men in the studies of Chen et al. (10) and Yang et al. (13).In the positive control group, the relative abundances of Rhodococcus, Escherichia-Shigella, Ralstonia, Sphingomonas, and Prevotella at the genus level were high, which is slightly different from the findings of other studies (10,14,15).However, Hou et al. observed that the species composition of semen communities varied widely among men, suggesting that each individual had unique and perhaps personalized bacterial communities in their semen (14).As observed in the research results, seminal microbiota communities were significantly more diverse than vaginal microbial communities were, and there were no predominant microorganisms in most semen samples (16).Studies have shown that there is Lactobacillus in semen, the most abundant genera of bacteria are Lactobacillus (19.9%) (15), and the OTU number of Lactobacillus was 6.79% (10) in the healthy male group, which is different from our results (1%).Lactobacillus also did not make up a high relative abundance of the semen microbiome in Dobay's study (8).
In addition, a large number of lactobacilli have been detected in female vaginal secretions (17), and sexual intercourse between men and women can cause changes in the semen microbiome (14).Mändar et al. found that the composition of semen microorganisms was differed among men with asexual experience (18).It is possible that most of the volunteers in this study had no sexual experience, so there was no transfer between microorganisms.Therefore, only a small amount of Lactobacillus was present in some semen samples of the positive control group.
In real crime scenes, semen samples are always exposed to microbe-laden environ ments, and it is critical to consider the relationship between semen microbes and environmental microbes.Semen samples collected at crime scenes may have been left in a variety of settings for a period before being discovered, such as on fabrics like clothing and bed sheets, within sealed garbage bags, or even in the soil.In view of this, this study simulated the possible exposure scenarios of samples encountered at crime scenes and explored the changing characteristics of semen microbial community under different fabric materials, closed packaging, and multi-microbial background, aiming to provide a more scientific and rigorous analysis basis for forensic investigations.
The results showed that the community diversity of each exposure group decreased significantly compared with that of the positive control group, which is similar to the findings of previous studies (9).Special attention should be given to the closed plastic bags and soil environments.The community richness and diversity of the CPB group were the lowest, followed by the soil group.These two special environmental factors may lead to changes in semen microorganisms, which are not conducive to the growth of microbial species and quantity.The LEfSe results showed that Staphylococcus in the  CPB group was significantly enriched, while Pseudomonas and Bacillus were significantly enriched in the soil exposure group.The CPB group and soil group showed a trend from high diversity to dominated by several dominant bacteria.This suggests that in actual cases, when semen is put in a closed plastic bag by a suspect or exposed to the soil environment, the microbial community will change significantly and will be dominated by the microorganisms that adapt to the environment, and these microorganisms will become the dominant bacteria.This approach may have a suggestive effect on the prediction of the environment of the crime scene where the semen samples come from.In addition, several bacterial genera with obvious changes deserve attention.After exposure of semen samples, the abundance of Pseudomonas increased significantly except for CPB group, especially in the soil group.Rhodococcus had obvious changes in the CPB group and soil group, and its abundance decreased sharply.In addition, the abundances of Ralstonia, Sphingomonas, and Prevotella decreased after exposure, especially in the CPB group and soil group.As a highly adaptable symbiotic organ ism, the ability of Staphylococcus to thrive under anaerobic conditions is particularly significant (19).After semen exposure to the CPB group, the relative abundance of Staphylococcus increased significantly to 49.61%, and after exposure to the soil group, the relative abundance of Bacillus increased significantly to 13.14%.Bacillus is a kind of aerobic or facultative anaerobic bacteria, and most of which are saprophytic bacteria (20).Bacillus species are ubiquitous in nature, mainly distributed in soil, plant surface, and water, with highly resistant spores, so they can tolerate a variety of adverse environments (21).The relative abundance of the above bacteria was significantly different among the different groups.
There was no significant difference in community diversity among the indoor exposure group, cotton group, polyester group, and wool fabrics group.The Venn diagram analysis at the phylum level showed that the common phyla in the indoor group, cotton group, polyester group, and wool fabrics group are up to 66.67%.According to the results of the hierarchical clustering tree at the OTU level, the samples exposed to three different materials and some samples from the indoor group were clustered.The NMDS analysis diagram also showed that the sample distance between each group was too short to distinguish between each group of samples.The above results showed that whether the sample is exposed to the carrier and what kind of carrier it is exposed to had no obvious impact on the semen microorganism.
While the NMDS analysis at the OTU level permitted a general discrimination among samples from differing exposure groups, the distinction between each group was not obvious.Furthermore, every body fluid possesses a distinctive microbial community structure, with dominant bacterial genera playing a pivotal role in distinguishing among various bodily fluids.Notably, genera such as Pseudomonas, Lactobacillus, Cutibacterium, Corynebacterium, and Staphylococcus contributed significantly to this differentiation.Based on the above analysis, a random forest model was devised to differentiate between semen exposed to environmental conditions and three normal tissues and body fluids.The model achieved a 100% accuracy rate in recognizing semen.It indicated that when semen was exposed to a specific environment, its microbiota could still maintain a certain stability, and semen could still be identified by a random forest model.Overall, RF model based on 16S rRNA gene sequencing had good application potential for body fluid identification.
At present, the results are very exciting, but there are several limitations that need to be acknowledged.The sample has only undergone exposure for a period of 15 days, and extending the duration of exposure is an area that merits further investigation.The sample was only exposed for 15 days.In addition, more exposure situations are worth simulating for further research to be closer to the actual case.

Conclusion
In this study, high-throughput sequencing was used to observe the changes in the microbial community after 15 days of semen exposure under different forensic simulation conditions.The results showed that the microorganism in the closed plastic bag and soil exposure changed significantly, and the microbial community was dominated by the bacteria adapted to the environment, while the other exposure groups did not change significantly.In addition, compared with saliva, skin, and vaginal fluid reported previously, the exposed semen was still recognized correctly as semen.In conclusion, the semen microbiome seems promising for fluid recognition, even after semen exposure.

FIG 3
FIG3 The mean relative abundances and Venn diagrams of bacterial phyla (A, B) and genera (C, D) represented in the V3-V4 16S rDNA amplicons obtained for samples from each groups.

FIG 4 (
FIG 4 (A) The barplot displayed the differences in relative abundance of the same species among different groups.(B, C) The developmental tree diagram showed the differential microorganisms obtained at different levels of species hierarchy among different groups.

FIG 5 (
FIG 5 (A) Hierarchical clustering of the distance matrix can reveal the distance between sample branches, and samples can be divided into cohesive groups based on different distance thresholds.(B, C) The non-metric multidimensional scaling (NMDS) diagram of samples in the different exposure groups.Points of different colors represent samples of different groups, and the closer the two sample points are, the more similar the species composition of the two samples is.

FIG 6 (
FIG 6 (A) Bar graph of microbial community composition of semen, skin, saliva, and vaginal fluid.(B) The Circos graph illustrated the proportional distribution of dominant genera within each group.(C) The non-metric multidimensional scaling diagram of samples in the different body fluids.

FIG 7 (
FIG 7 (A) Model evaluation by using top important features.The X-axis represents the number of top important features, and the Y-axis represents the average prediction error rate using 10-fold cross-validation.(B) Bubble plot of the mean decrease accuracy values of the top 30 important features.

TABLE 1
The prediction results of the random forest classification model on 53 test samples, including the predicted grouping and the distribution probability of samples a (Continued on next page)

TABLE 1
The prediction results of the random forest classification model on 53 test samples, including the predicted grouping and the distribution probability of samples a (Continued) a The bold part indicated the highest distribution probability.